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We consider the following fc-sparse recovery 
problem: design an m x n matrix A, such 
that for any signal x, given Ax we can 
efficiently recover x satisfying [|sc — ^[|i < 
C 

min/%_gpaxse x' \\ x — ^'lli - ^ ^ known that 
there exist matrices A with this property that 
have only 0(klog(n/k)) rows. 

In this paper we show that this bound is tight. 
Our bound holds even for the more general ran- 
domized version of the problem, where A is a 
random variable, and the recovery algorithm is 
required to work for any fixed x with constant 
probability (over A). 



1 Introduction 

In recent years, a new "linear" approach for ob- 
taining a succinct approximate representation of 
n-dimensional vectors (or signals) has been dis- 
covered. For any signal x, the representation 
is equal to Ax, where A is an m x n matrix, 
or possibly a random variable chosen from some 
distribution over such matrices. The vector Ax 
is often referred to as the measurement vector 
or sketch of x. Although m is typically much 
smaller than n, the sketch Ax contains plenty of 
useful information about the signal x. A par- 
ticularly useful and well-studied problem is that 
of stable sparse recovery: given Ax, recover a k- 
sparse vector x (i.e., having at most k non-zero 
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for some norm parameters p and q and an ap- 
proximation factor C = C{k). If the matrix A is 
random, then Equation (pQ) should hold for each 
x with some probability (say, 3/4). Sparse re- 
covery has applications to numerous areas such 
as data stream computing [Mut03, Ind07j and 
compressed sensing |CRT06l IDon061 lDDT+08] . 

It is known that there exist matrices A and 
associated recovery algorithms that produce ap- 
proximations x satisfying Equation (JTJ) with p = 
q = 1 (i.e., the u l\/i\ guarantee"), constant C 
and sketch length m = 0{k\og{n/k)). In partic- 
ular, a random Gaussian matrix [CRT06jH or a 
random sparse binary matrix ( [BGI + 08| . build- 
ing on |CCFC041ICM05j ) has this property with 
overwhelming probability. In comparison, using 
a non-linear approach, one can obtain a shorter 
sketch of length 0{k): it suffices to store the k 
coefficients with the largest absolute values, to- 
gether with their indices. 

Surprisingly, it was not known whether the 
0{k\og{n/k)) bound for linear sketching could 
be improved upon in general, although such 
lower bounds were known to hold under certain 
restrictions (see section 11.21 for a more detailed 
overview). This raised hope that the O(k) bound 
might be achievable even for general vectors x. 
Such a scheme would have been of major prac- 
tical interest, since the sketch length determines 
the compression ratio, and for large n any extra 
logn factor worsens that ratio tenfold. 

In this paper we show that, unfortunately, 



In fact, they even achieve a somewhat stronger £2/^1 
guarantee, see Section [TT21 
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such an improvement is not possible. We ad- 
dress two types of recovery schemes: 

• A deterministic one, which involves a fixed 
matrix A and a recovery algorithm which 
work for all signals x. The aforementioned 
results of [CRT06] and others are examples 
of such schemes. 

• A randomized one, where the matrix A is 
chosen at random from some distribution, 
and for each signal x the recovery procedure 
is correct with constant probability (say, 
3/4). Some of the early schemes proposed in 
the data stream literature (e.g., |CCFC04~1 
CM05J) belong to this category. 

Our main result is that, even in the random- 
ized case, the sketch length m must be at least 
Q,(k\og(n/k)). By the aforementioned result 
of |CRT06| this bound is tight. 

Thus, our results show that the linear com- 
pression is inherently more costly than the sim- 
ple non-linear approach. 

1.1 Our techniques 

On a high level, our approach is simple and nat- 
ural, and utilizes the packing approach: we show 
that any two "sufficiently" different vectors x 
and x' are mapped to images Ax and Ax 1 that 
are "sufficiently" different themselves, which re- 
quires that the image space is "sufficiently" high- 
dimensional. However, the actual arguments are 
somewhat subtle. 

Consider first the (simpler) deterministic case. 
We focus on signals x = y + z, where y can be 
thought of as the "head" of the signal and z as 
the "tail" . The "head" vectors y come from a set 
Y that is a binary error-correcting code, with a 
minimum distance f2(/c), where each codeword 
has weight k. On the other hand, the "tail" vec- 
tors z come from an l\ ball (say B) with a radius 
that is a small fraction of k. It can be seen that 
for any two elements y, y' G Y, the balls y + B 
and y' + B, as well as their images, must be dis- 
joint. At the same time, since all vectors x live 
in a "large" l\ ball B' of radius 0(k), all images 
Ax must live in a set AB' . The key observation 



is that the set AB' is a scaled version of A(y + B) 
and therefore the ratios of their volumes can be 
bounded by the scaling factor to the power of 
the dimension m. Since the number of elements 
of Y is large, this gives a lower bound on m. 

Unfortunately, the aforementioned approach 
does not seem to extend to the randomized case. 
A natural approach would be to use Yao's prin- 
ciple, and focus on showing a lower bound for a 
scenario where the matrix A is fixed while the 
vectors x = y + z are "random". However, this 
approach fails, in a very strong sense. Specifi- 
cally, we are able to show that there is a distri- 
bution over matrices A with only 0{k) rows so 
that for a fixed y £Y and z chosen uniformly at 
random from the small ball B, we can recover y 
from A(y + z) with high probability. In a nut- 
shell, the reason is that a random vector from 
B has an £2 norm that is much smaller than the 
£2 norm of elements of Y (even though the l\ 
norms are comparable). This means that the 
vector x is "almost" /c-sparse (in the £2 norm), 
which enables us to achieve the O(k) measure- 
ment bound. 

Instead, we resort to an altogether dif- 
ferent approach, via communication complex- 
ity [KN97]. We start by considering a "dis- 
crete" scenario where both the matrix A and 
the vectors x have entries restricted to the poly- 
nomial range {— n c ...n c } for some c = 0(1). 
In other words, we assume that the matrix and 
vector entries can be represented using O(logra) 
bits. In this setting we show the following: 
there is a method for encoding a sequence of 
d = 0(k\og(n/k)\ogn) bits into a vector x, so 
that any sparse recovery algorithm can recover 
that sequence given Ax. Since each entry of Ax 
conveys only O(logre) bits, it follows that the 
number m of rows of A must be £l(k \og{n/k)). 

The encoding is performed by taking 

log 71 

x = ^D^xj, 
3=1 

where D = O(l) and the Xj's are chosen from 
the error-correcting code Y defined as in the de- 
terministic case. The intuition behind this ap- 
proach is that a good £i/l\ approximation to x 
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reveals most of the bits of x\ ogn . This enables 
us to identify xi og n exactly using error correc- 
tion. We could then compute Ax — Ax\ ogn = 
A(Y^,°=i 1 D^xj), and identify cci ogn _i . . . x\ in 
a recursive manner. The only obstacle to com- 
pleting this argument is that we would need 
the recovery algorithm to work for all Xi, which 
would require lower probability of algorithm fail- 
ure (roughly 1/logn). To overcome this prob- 
lem, we replace the encoding argument by a re- 
duction from a related communication complex- 
ity problem called Augmented Indexing. This 
problem has been used in the data stream lit- 
erature |CW091 IKNW10] to prove lower bounds 
for linear algebra and norm estimation problems. 
Since the problem has communication complex- 
ity of Q(d), the conclusion follows. 

We apply the argument to arbitrary matri- 
ces A by representing them as a sum A' + A", 
where A' has O(logn) bits of precision and 
A" has "small" entries. We then show that 
A'x = A(x + s) for some s with {{s^ < 
H^ll^ j n the communication game, this 
means we can transmit A'x and recover xi ogn 
from ^'(E; o = T D j Xj) = D^ Xj +s). 

One catch is that s depends on A. The recov- 
ery algorithm is guaranteed to work with proba- 
bility 3/4 for any x, so it works with probability 
3/4 over any distribution on x independent of A. 
However, there is no guarantee about recovery of 
x + s when s depends on A (even if s is tiny). 
To deal with this, we choose a u uniformly from 
the l\ ball of radius k. We can set {{s^ -C k/n, 
so x + u and x + u + s are distributions with 
o(l) statistical distance. Hence recovery from 
A{x+u+ s) matches recovery from A{x+u) with 
probability at least 1 — o(l), and H^l^ is small 
enough that successful recovery from A(x + u) 
identifies x\ ogn . Hence we can recover xi ogn from 
A{x + u + s) = A'x + Au with probability at least 
3/4 — o(l) > 1/2, which means that the Aug- 
mented Indexing reduction applies to arbitrary 
matrices as well. 

1.2 Related Work 

There have been a number of earlier works that 
have, directly or indirectly, shown lower bounds 



for various models of sparse recovery and certain 
classes of matrices and algorithms. Specifically, 
one of the most well-known recovery algorithms 
used in compressed sensing is .^-minimization, 
where a signal x £ W 1 measured by matrix A is 
reconstructed as 

x := argmin ||ic'||i- 
x': Ax'=Ax 

Kashin and Temlyakov [KT07J (building on prior 
work on Gelfand width [GG841 IGlu841 IKas77j . 
see also |Don06| ) gave a characterization of ma- 
trices A for which the above recovery algorithm 
yields the £2/^1 guarantee, i.e., 

\\x — x\\2 < CkT 1 ^ min llx — x%\ 
fc-sparse x' 

for some constant C, from which it can be shown 
that such an A must have m = £l(klog(n/k)) 
rows. 

Note that the £2/^1 guarantee is somewhat 
stronger than the £i/£\ guarantee investigated 
in this paper. Specifically, it is easy to observe 
that if the approximation x itself is required to 
be 0(fc)-sparse, then the £2/^1 guarantee implies 
the £\ I £\ guarantee (with a somewhat higher ap- 
proximation constant). For the sake of simplic- 
ity, in this paper we focus mostly on the l\j£\ 
guarantee. However, our lower bounds apply to 
the £il£\ guarantee as well: see footnote on page 

E 

The results on Gelfand width can be also used 
to obtain lower bounds for general recovery al- 
gorithms (for the deterministic recovery case), 
as long as the sparsity parameter k is larger 
than some constant. This was explicitly stated 
in [FPRUinj . see also [Don06j . 

On the other hand, instead of assuming a spe- 
cific recovery algorithm, Wainwright [Wai07| as- 
sumes a specific (randomized) measurement ma- 
trix. More specifically, the author assumes a fc- 
sparse binary signal x £ {0, a}™, for some a > 0, 
to which is added i.i.d. standard Gaussian noise 
in each component. The author then shows that 
with a random Gaussian matrix A, with each 
entry also drawn i.i.d. from the standard Gaus- 
sian, we cannot hope to recover x from Ax with 
any sub-constant probability of error unless A 
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has m = Q,(-^logj) rows. The author also 
shows that for a = y/l/k, this is tight, i.e., that 
m = Q(klog(n/k)) is both necessary and suffi- 
cient. Although this is only a lower bound for 
a specific (random) matrix, it is a fairly pow- 
erful one and provides evidence that the often 
observed upper bound of 0(klog(n/k)) is likely 
tight. 

More recently, Dai and Milenkovic [DM08J , ex- 
tending on [EG88] and |FR99j . showed an up- 
per bound on superimposed codes that trans- 
lates to a lower bound on the number of rows 
in a compressed sensing matrix that deals only 
with A;-sparse signals but can tolerate measure- 
ment noise. Specifically, if we assume a ^-sparse 
signal x G ([—t, t] flZ)", and that arbitrary noise 
fj, G M. n with || fi ||i < d is added to the measure- 
ment vector Ax, then if exact recovery is still 
possible, A must have had m > Ck log n/ log k 
rows, for some constant C = C(t,d) and suffi- 
ciently large n and /cJl 

2 Preliminaries 

In this paper we focus on recovering sparse ap- 
proximations x that satisfy the following C- 
approximate guarantee with sparsity pa- 

rameter k: 

(2) ||x — xlL < C min la; — x' . 

fc-sparse x' " Mi 

We define a C-approximate deterministic 
recovery algorithm to be a pair (A,g/) 
where A is an m x n observation matrix and srf 
is an algorithm that, for any x, maps Ax (called 
the sketch of x) to some x that satisfies Equa- 
tion fl2D- 

We define a C-approximate randomized 
recovery algorithm to be a pair (A, &/) where A is 
a random variable chosen from some distribution 
over m x n measurement matrices, and &/ is an 
algorithm which, for any x, maps a pair (A, Ax) 

2 Here A is assumed to have its columns normalized to 
have £i-norm 1. This is natural since otherwise we could 
simply scale A up to make the image points Ax arbitrarily 
far apart, effectively nullifying the noise. 



to some x that satisfies Equation ([2]) with prob- 
ability at least 3/4. 

We use Bp(r) to denote the £ p ball of radius 
r in R n ; we skip the superscript n if it is clear 
from the context. 

For any vector x, we use ||rr||o to denote the u £o 
norm of x" , i.e., the number of non-zero entries 
in x. 

3 Deterministic Lower Bound 

We will prove a lower bound on m for 
any C-approximate deterministic recovery algo- 
rithm. First we use a discrete volume bound 
(Lemma 13. II) to find a large set Y of points that 
are at least k apart from each other. Then we use 
another volume bound (Lemma I3.2() on the im- 
ages of small t\ balls around each point in Y . If 
m is too small, some two images collide. But the 
recovery algorithm, applied to a point in the col- 
lision, must yield an answer close to two points 
in Y. This is impossible, so m must be large. 

Lemma 3.1. ( Gilbert- Varshamov) For any 
q,k G Z + , e G M + with e < 1 — 1/q, there exists a 
set Y C {0, l} qk of binary vectors with exactly k 
ones, such that Y has minimum Hamming dis- 
tance 2ek and 

\og\Y\ > (l-H q (e))klogq 

where H q is the q-ary entropy function H q (x) = 
-xlog g £§x - (1- a;)log 9 (l - x). 

See appendix for proof. 

Lemma 3.2. Take an mxn real matrix A, pos- 
itive reals e,p, X, and Y C B™(\). If \Y\ > 
(1 + l/e) m , then there exist z,z G Bp(eX) and 
y,y G Y with y ^y and A{y + z) = A(y + z) . 

Proof. If the statement is false, then the images 
of all \Y\ balls {y + Bp(e\) \ y G Y} are disjoint. 
However, those balls all lie within Bp((l + e)A), 
by the bound on the norm of Y. A volume ar- 
gument gives the result, as follows. 

Let S = ABp(l) be the image of the n- 
dimensional ball of radius 1 in m-dimensional 
space. This is a poly tope with some volume 
V. The image of Bp(eX) is a linearly scaled 
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S with volume (eA) m V , and the volume of the 
image of -Bp((l + e)A) is similar with volume 
((1 + e)\) m V . If the images of the former are 
all disjoint and lie inside the latter, we have 
\Y\ (e\) m V < ((l + e)X) m V, or \Y\ < (l + l/e) m . 
If Y has more elements than this, the images of 
some two balls y + Bp(eX) and y + Bp(eX) must 
intersect, implying the lemma. □ 

Theorem 3.1. Any C '-approximate determinis- 
tic recovery algorithm must have 



m > 



l-H ln/k} (l/2) 



k log 



n 
k\ 



log(4 + 2C) 

Proof. Let Y be a maximal set of fc-sparse n- 
dimensional binary vectors with minimum Ham- 
ming distance k, and let 7 = 3 + 2 c • By 
Lemma 13.11 with q = [n/k\ we have log|y| > 

(l-ff Ln/fcJ (l/2))fclogLn/A:J. 

Suppose that the theorem is not true; then 
m < log|y|/log(4+2C) =log|y|/log(l + l/7), 
or \Y\ > (1 + i) m . Hence Lemma 13.21 gives us 
some y,y GY and z, z G B\{^k) with A{y + z) = 
A{y + z). 

Let w be the result of running the recovery 
algorithm on A(y + z). By the definition of a 
deterministic recovery algorithm, we have 



\y + z- 



w 



< C 



mm 



\y + z-y 



w 



1 



\\y 

||y-Hli < (i + c) 

and similarly 



fc-sparse y' 

<c\\4i 

. ,1 + C) 7 A; 
i±^fc, so 



Nli< 



l+C 
3+2C 



Il - 3+2C 



|y - y\\i 



< 



\y 



w 



+ \\y 



2 + 2C 



w\\ 



k < k. 



3 + 2C 

But this contradicts the definition of Y, so m 
must be large enough for the guarantee to hold. 

□ 

Corollary 3.1. If C is a constant bounded away 
from zero, then m = Q,(klog(n/k)). 

4 Randomized Upper Bound 
for Uniform Noise 

The standard way to prove a randomized lower 
bound is to find a distribution of hard inputs, 



and to show that any deterministic algorithm is 
likely to fail on that distribution. In our context, 
we would like to define a "head" random variable 
y from a distribution Y and a "tail" random vari- 
able z from a distribution Z, such that any algo- 
rithm given the sketch of y + z must recover an 
incorrect y with non-negligible probability. 

Using our deterministic bound as inspiration, 
we could take Y to be uniform over a set of k- 
sparse binary vectors of minimum Hamming dis- 
tance k and Z to be uniform over the ball Bifak) 
for some constant 7 > 0. Unfortunately, as the 
following theorem shows, one can actually per- 
form a recovery of such vectors using only 0(k) 
measurements; this is because ||z|| 2 is very small 
(namely, 0{k/yfn)) with high probability. 

Theorem 4.1. Let Y C W 1 be a set of sig- 
nals with the property that for every distinct 
2/1)2/2 £ Y, \\yi — 2/2 1| 2 > r, for some parameter 
r > 0. Consider "noisy signals" x = y + z, where 
y £Y and z is a "noise vector" chosen uniformly 
at random from Bi(s), for another parameter 
s > 0. Then using an m x n Gaussian measure- 
ment matrix A = {1/ yjm)(gij), where gij's are 
i.i.d. standard Gaussians, we can recover y £Y 
from A{y-\-z) with probability 1 — 1/ n (where the 
probability is over both A and z ), as long as 



s<0 



rm 



l/2 n l/2-l/m 



\Y\ 1 / m loe 3/2 



n 



To prove the theorem we will need the follow- 
ing two lemmas. 

Lemma 4.1. For any 5 > 0, 2/1,2/2 £ Y, y\ 7^ 
t/2, and z G M n , each of the following holds with 
probability at least 1 — 5: 

• WMvi ~ 2/2)112 > ^r~\\yi - 2/2II2, and 



. \\Az\\ 2 < (V(8/m)log(l/5) + 1)|H| 2 . 

See the appendix for the proof. 

Lemma 4.2. A random vector z chosen uni- 
formly from Bi(s) satisfies 



Pr[||z||2 > aslogn/\/n\ < 1/n 



a-l 
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See the appendix for the proof. 

Proof of theorem. In words, Lemma 14.11 says 
that A cannot bring faraway signal points too 
close together, and cannot blow up a small noise 
vector too much. Now, we already assumed the 
signals to be far apart, and Lemma 14.21 tells us 
that the noise is indeed small (in £2 distance). 
The result is that in the image space, the noise 
is not enough to confuse different signals. Quan- 
titatively, applying the second part of Lemma 
14.11 with 5 = 1/n 2 , and Lemma 14.21 with a = 3, 
gives us 
(3) 

H . H , n ( log 1/2 n ^ /slog 3/2 n\ 
\\Az 2 < O — — \\Z 2 < O 



■m 



1/2 



(mn) 1 / 2 



with probability > 1 — 2/n 2 . On the other hand, 
given signal y\ G Y, we know that every other 
signal 2/2 £ Y satisfies 1 1 2/1 — 2/2 1 1 2 > r > so by 
the first part of Lemma 14 . 1 1 with 5 = l/(2n\Y\), 
together with a union bound over every y2 6 Y, 
(4) 

\\A( yi - y 2 )\\2 > 



\yi - 2/2 1 1 2 



> 



3(2n|r|) 1 /'" - 3(2n\Y\y/ m 

holds for every 2/2 G Y, 2/2 7^ yi, simultaneously 
with probability 1 — l/(2n). 

Finally, observe that as long as ||-A#[|2 < 
1 1 v4(yi — 2/2)1)2/2 for every competing signal 2/2 £ 
Y, we are guaranteed that 

\\A(y 1 + z)-Ay 1 \\ 2 = \\Az\\ 2 

< \\A{ yi -y 2 )h ~ \\Az\\ 2 

< \\A{ yi + z) - Ay 2 \\2 

for every 2/2 ^ Vi > so we can recover 2/1 by simply 
returning the signal whose image is closest to our 
measurement point A(y\ + z) in £2 distance. To 
achieve this, we can chain Equations ([3]) and @ 
together (with a factor of 2), to see that 



s < O 



rm 1/2 n 1/2 - 1/m 



log 3 / 2 



n 



suffices. Our total probability of failure is at 
most 2/n 2 + l/(2n) < 1/n. 



The main consequence of this theorem is that 
for the setup we used in Section [3] to prove a 
deterministic lower bound of Q(k log(n/k)), if 
we simply draw the noise uniformly randomly 
from the same l\ ball (in fact, even one with a 
much larger radius, namely, polynomial in n), 
this "hard distribution" can be defeated with 
just 0(k) measurements: 

Corollary 4.1. If Y is a set of binary k-sparse 
vectors, as in Section 0, and noise z is drawn 
uniformly at random from B\(s), then for any 
constant e > 0, m = 0{k/e) measurements suf- 
fice to recover any signal in Y with probability 
1 — 1/n, as long as 



s<0 



'fc3/2+6 n l/2-e 



II 



Proof. The parameters in this case are r = k 
and \Y\ < (£) < (ne/k) k , so by Theorem EL"Tl it 
suffices to have 



s < O 



/ ^3/2+fe/m n l/2-(fe+l)/r, 



' n 



V log 3/2 ; 

Choosing m = {k + l)/e yields the corollary. □ 

5 Randomized Lower Bound 

Although it is possible to partially circumvent 
this obstacle by focusing our noise distribution 
on "high" £2 norm, sparse vectors, we are able 
to obtain stronger results via a reduction from 
a communication game and the corresponding 
lower bound. 

The communication game will show that a 
message Ax must have a large number of bits. 
To show that this implies a lower bound on the 
number of rows of A, we will need A to be dis- 
crete. Hence we first show that discretizing A 
does not change its recovery characteristics by 
much. 

5.1 Discretizing Matrices 

Before we discretize by rounding, we need to en- 
sure that the matrix is well conditioned. We 
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show that without loss of generality, the rows 
of A are orthonormal. 

We can multiply A on the left by any invertible 
matrix to get another measurement matrix with 
the same recovery characteristics. If we consider 
the singular value decomposition A = LTSV*, 
where U and V are orthonormal and E is off 
the diagonal, this means that we can eliminate U 
and make the entries of E be either or 1. The 
result is a matrix consisting of m orthonormal 
rows. For such matrices, we prove the following: 

Lemma 5.1. Consider any mxn matrix A with 
orthonormal rows. Let A' be the result of round- 
ing A to b bits per entry. Then for any v G W 1 
there exists an s £ 1" with A'v = A(v — s) and 
Ws^ < n 2 2~ b Wv^. 

Proof. Let A" = A — A' be the roundoff error 
when discretizing A to b bits, so each entry of A" 
is less than 2~ b . Then for any v and s = A T A"v, 
we have As = A"v and 

Hi = ||A T A"' i ;|| 1 < v^HA"^ 

< my/n2~ b \\v\\ x < n 2 2~ b (H^ . 

□ 

5.2 Communication Complexity 

We use a few definitions and results from two- 
party communication complexity. For further 
background see the book by Kushilevitz and 
Nisan [KN97] . Consider the following commu- 
nication game. There are two parties, Alice 
and Bob. Alice is given a string y G {0,1}^. 
Bob is given an index % G [d], together with 
yi+i,yi+2, ■ ■ ■ ,Vd- The parties also share an ar- 
bitrarily long common random string r. Alice 
sends a single message M(y, r) to Bob, who must 
output yi with probability at least 2/3, where the 
probability is taken over r. We refer to this prob- 
lem as Augmented Indexing. The communication 
cost of Augmented Indexing is the minimum, over 
all correct protocols, of the length of the message 
M{y, r) on the worst-case choice of r and y. 

The next theorem is well-known and follows 
from Lemma 13 of [MNSW98J (see also Lemma 
2 of |BY.IKK04p . 



Theorem 5.1. The communication cost of Aug- 
mented Indexing is Q(d). 

Proof. First, consider the private-coin version 
of the problem, in which both parties can toss 
coins, but do not share a random string r (i.e., 
there is no public coin). Consider any correct 
protocol for this problem. We can assume the 
probability of error of the protocol is an arbi- 
trarily small positive constant by increasing the 
length of Alice's message by a constant factor 
(e.g., by independent repetition and a major- 
ity vote). Applying Lemma 13 of |MNSW98| 
(with, in their notation, t = 1 and a = c' ■ d 
for a sufficiently small constant d > 0), the 
communication cost of such a protocol must be 
Q(d). Indeed, otherwise there would be a proto- 
col in which Bob could output yi with probabil- 
ity greater than 1 /2 without any interaction with 
Alice, contradicting that Pr[yj = 1/2] and that 
Bob has no information about yi. Our theorem 
now follows from Newman's theorem (see, e.g., 
Theorem 2.4 of [KNR99| ) . which shows that the 
communication cost of the best public coin pro- 
tocol is at least that of the private coin protocol 
minus O(logd) (which also holds for one-round 
protocols). □ 

5.3 Randomized Lower Bound Theo- 
rem 

Theorem 5.2. For any randomized l\ / 'l\ recov- 
ery algorithm (A,<s/), with approximation fac- 
tor C = 0(1), A must have m = Q(klog(n/k)) 
rows. 

Proof. We shall assume, without loss of gener- 
ality, that n and k are powers of 2, that k di- 
vides n, and that the rows of A are orthonormal. 
The proof for the general case follows with minor 
modifications. 

Let (A, g/) be such a recovery algorithm. 
We will show how to solve the Augmented 
Indexing problem on instances of size d = 
£l(klog(n/k) logra) with communication cost 
0(m log n). The theorem will then follow by 
Theorem 15.11 

Let X be the maximal set of /c-sparse n- 
dimensional binary vectors with minimum Ham- 
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ming distance k. From Lemma 13.11 we have 
log\X\ = tt(klog(n/k)). Letd= [log \X\\ log n, 
and define D = 2C + 3. 

Alice is given a string y G {0, l} d , and Bob is 
given i £ [d] together with y i+1 ,y i+2 , . . . ,yd, as 
in the setup for Augmented Indexing. 

Alice splits her string y into logn contigu- 



ous chunks y l ,y 2 



log n 



each containing 



[log \X\\ bits. She uses y 3 as an index into X 
to choose Xj. Alice defines 

x = D l Xl + D 2 x 2 + ■■■ + D losn x logn . 

Alice and Bob use the common randomness r to 
agree upon a random matrix A with orthonormal 
rows. Both Alice and Bob round A to form A' 
with b = [~(4 + 2 log D) log n] = 0(log n) bits per 
entry. Alice computes A'x and transmits it to 
Bob. 

From Bob's input i, he can compute the value 
j = for which the bit yi occurs in y 3 . Bob's 
input also contains j/j+i, ... ,y n , from which he 
can reconstruct Xj+i, . . . , xi og n> and in particular 
can compute 

z = D 3+1 x J+1 + D 3+2 x j+2 + ■■■+ D losn x\ ogn . 

Set w = x — z = Yll=i D % %i- Bob then computes 
A'z, and using A'x and linearity, A'w. Then 



|i < V kD i < k^^- < kD 2losn . 
n - ^ D - I 

i=l 



\w 



So from Lemma 15.11 there exists some s with 
A'w = A(w — s) and 

Hi < n 2 2 -3 log n-2 log D logn < fc ^2 

Bob chooses another vector u uniformly from 
Bf(k), the l\ ball of radius k, and computes 
A(w — s — u) = A'w — Au. 

Bob runs the estimation algorithm on A 
and A (w — s — u), obtaining w. We have 
that u is independent of w and s, and that 
u \\i < — 1/n 2 ) < — Hsllj with probabil- 

(1 - l/n 2 ) n > 1 - 1/n. 



Vol(i?7(fc(l~l/n 2 ))) 



But 



< k — \\s\\i} C 



ll^lli ^ so the ranges of the random variables 
w — s — u and w — u overlap in at least a 1 — 1/n 



fraction of their volumes. Therefore w — s — u 
and w — u have statistical distance at most 1/n. 
The distribution of w — u is independent of A, 
so running the recovery algorithm on A(w — u) 
would work with probability at least 3/4. Hence 
with probability at least 3/4 — 1/n > 2/3 (for n 
large enough), w satisfies the recovery criterion 
for w — u, meaning 



\w — u — w\\ x < C 



mm 



fc-sparse w' 



\w — u — w 



Now, 



\D 3 Xj — w)|L < \\w — u — D 3 x^ l + 



w — u — w 



< (1 + C) \\w - u - D^j^ 

3-1 

< (i -h cxh^iu +Ell jD ^lli> 



i=l 



3-1 

< (l + C)kJ2 Di 

i=0 
(1+^ 

<k D-l 
= kD j /2. 

And since the minimum Hamming distance in 
X is k, this means — to L < — 

for all x' G X,x' ^ xj^. So Bob can correctly 
identify Xj with probability at least 2/3. From 
Xj he can recover y 3 , and hence the bit yi that 
occurs in y 3 . 

Hence, Bob solves Augmented Indexing with 
probability at least 2/3 given the message A'x. 
The entries in A' and x are polynomially 
bounded integers (up to scaling of A'), and so 
each entry of A'x takes O(logn) bits to describe. 
Hence, the communication cost of this proto- 
col is O(mlogn). By Theorem 15.11 mlogn = 
U(klog{n/k) logn), or m = U(klog(n/k)). □ 
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A Proof of Lemma 13.11 



□ 



Proof. We will construct a codebook T of block 

length k, alphabet q, and minimum Hamming B Proof of Lemma 14.11 

distance ek. Replacing each character i with the 

g-long standard basis vector ej will create a bi- Proof. By standard arguments (see, e.g., |IN07j ). 
nary g/c-dimensional codebook S with minimum for any D > we have 
Hamming distance 2ek of the same size as T, 



where each element of S has exactly k ones. p r 

The Gilbert- Varshamov bound, based on vol- 
umes of Hamming balls, states that a codebook 
of size L exists for some 



\\A{ yi -y 2 )h<^ 



L > 



D 

and 

Pr[||A?|| 2 > D\\z\\ 2 ] < e -m(D-ir/8 



spek-1 rk\ i _ i\j 

l^i=o Vijy'i > Setting both right-hand sides to 5 yields the 

lemma. □ 



Using the claim (analogous to [vL98], p. 21 
proven below) that for e < 1 — l/q 

C Proof of Lemma 14.21 

V ( k \ (a - 1Y < a H "^ k 

Z-^ \ i I Proof. Consider the distribution of a single co- 

ordinate of z, say, z\. The probability density 
we have that logL > (1 - H q (e))k\ogq, as de- of \z x \ taking value t G [0,s] is proportional to 
sired. □ the (n — l)-dimensional volume of B[ n (s — t), 
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which in turn is proportional to (s — t) n ~ l . Nor- 
malizing to ensure the probability integrates to 
1, we derive this probability as 



Zl \>D] = / -( 8 -t) n - 1 dt = (l-D/s) n . 



In particular, for any a > 1, 

Pr[|zi| > as log n/n] = (1 - a log n/n) n < e~ alog 
= l/n a . 

Now, by symmetry this holds for every other co- 
ordinate Zi of z as well, so by the union bound 

Pr[||z||oo > aslogn/n] < l/n a_1 , 

and since ||z||2 < y/n- \\z\\oo for any vector z, the 
lemma follows. □ 



P(\zi\=t) = -(s-t) 



It follows that, for any D € [0, s], 
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